Method and system for voice activating web pages

ABSTRACT

A method for providing a web page having an audio interface. The method including providing data specifying a web page, including in the data a first rule based grammar statement having a first phrase portion, a first command portion and a first tag portion, and including in the data a second rule based grammar statement having a second phrase portion, a second command portion, and a second tag portion.

CROSS REFERENCE TO RELATED PRIORITY APPLICATIONS

This application claims priority to International Application Serial No.PCT/US01/45223, filed Nov. 30, 2001, which claims priority to U.S.Provisional Patent Application entitled “Hyper-Speech MarkupLanguage/Hyper-Voice Markup Language (HSML/HVML), A Technology for VoiceActivating Visual Web Pages,” Ser. No. 60/250,809, filed on Dec. 1,2000, which is hereby incorporated by reference into this application inits entirety.

BACKGROUND OF THE INVENTION

Over the past decade Automated Speech Recognition (ASR) systems haveprogressed to the point where a high degree of recognition accuracy maybe obtained by ASR systems installed on moderately priced personalcomputers and workstations. This has led to a rise in the number of ASRsystems available for consumer and industry applications.

ASR systems rely on voice grammars to recognize vocal commands input viaa microphone and act on those commands. Voice grammars fall into twocategories: rule based grammars and free speech grammars. Rule basedgrammars allow the recognition of a limited set of predefined phrases.Each rule based grammar, if invoked, causes an event or set of events tooccur. A rule based grammar is invoked if an utterance, input via amicrophone, matches a speech template corresponding to a phrase storedwithin the set of predefined phrases. For example the user may say “savefile” while editing a document in a word processing program to invokethe save command. On the other hand, free speech grammars recognizelarge sets of words in a given domain such as Business English. Thesegrammars are generally used for dictation applications, some examples ofthese systems are Dragon Naturally Speaking and IBM Viavoice 7Millennium. ASR systems have also incorporated text to speech (TTS)capabilities which enable ASR systems to speak graphically rendered textusing a synthesized voice. For example, an ASR system can read ahighlighted paragraph within a word processor aloud through speakers.

ASR systems have been integrated with web browsers to create voiceenabled web browsers. Voice enabled web browsers allow the user tonavigate the Internet by using voice commands which invoke rule basedgrammars. Some of the voice commands used by these browsers includeutterances that cause the software to execute traditional commands usedby web browsers. For example if the user says “home” into a microphone,a voice enabled browser would execute the same routines that the voiceenabled web browser would execute if a user clicked on the “home” buttonof the voice enabled web browser. In addition, some voice enabled webbrowsers create rule based grammars based on web page content. As a webpage is downloaded and displayed some voice enabled web browsers createrule based grammars based on the links contained within the web page.For example, if web page displayed a link “company home,” such a voiceenabled web browser would create a rule based grammar, effective whilethe web page is displayed, such that if a user uttered the phrase“company home” into a microphone the voice enabled web browser woulddisplay the web page associated with the link. One shortcoming of thisapproach is that the rules generated from web page content are fixedover long periods of time because web pages are not redesigned often.Additionally, the rule based grammars are generated from web pagecontent, which is primarily intended for visual display. In effect,these systems limit the user to saying what appears on the screen.

Web pages can also incorporate audio elements, which cause sound to beoutput. Currently web pages can incorporate audio elements into theirweb pages in two ways. The first way to incorporate an audio element isto use audio wave file content to provide a human sounding voice to aweb page. Using audio wave files allows the web page designer to designthe visual and audio portions of the web page independently, but thisfreedom and added functionality comes at a high price. The bandwidthrequired to transfer binary sound files over the Internet to the enduser is large. The second way to incorporate an audio element is toleverage the functionality of an ASR system. Voice enabled web browsersmay utilize the ITS functionality of an ASR system in such a way as tohave the computer “speak” the content of a web page. Using this approachcauses the bandwidth needed to view the page with or without the audioelement be approximately the same but limits the subject matter of whatthe web browser can speak to the content of the web page.

Voice XML (VXML) affords a web page designer with another option. VXMLallows a user to navigate a web site solely through the use of audiocommands typically used over the phone. VXML requires that a TTStranslator read a web page to a user by translating the visual web pageto an audio expression of the web page. The user navigates the web byspeaking the links the user wants to follow. With this approach a usercan navigate the Internet by using only the user's voice, but the audiocontent is typically generated from web page content that is primarilydesigned for visual interpretation; and the visual interface is removedfrom the user's experience.

Accordingly, there exists a continuing need to independently create anaudio component of a web page that does not demand a large amount oftransmission bandwidth and exists in conjunction with the visualcomponent of a web page.

SUMMARY OF THE INVENTION

The present invention in one aspect is a method for creating an audiointerface for a visual web page. In accordance with a first exemplaryembodiment of the method of the present invention, there is provided amethod for providing a web page having an audio interface, includingproviding data specifying a web page; including in the data a first rulebased grammar statement having a first phrase portion, a first commandportion and a first tag portion; and including in the data a second rulebased grammar statement having a second phrase portion, a second commandportion, and a second tag portion.

According to a second exemplary embodiment of the method of the presentinvention, a method for receiving a web page having an audio interface,including receiving data specifying a web page, the data including afirst rule based grammar statement having a first phrase portion, afirst command portion and a first tag portion, and a second rule basedgrammar statement having a second phrase portion, a second commandportion, and a second tag portion; storing at least a portion of thefirst rule based grammar; storing at least a portion of the second rulebased grammar, providing the data specifying the web page to a webbrowser, providing the first phrase portion and the first tag portion asa first recognition grammar to an automated speech recognition engine;and providing the second phrase portion and the second tag portion as asecond recognition grammar to the automated speech recognition engine.

In another aspect of the present invention there is provided a systemfor receiving a web page having an audio interface. According to anexemplary embodiment of the system of the present invention, there isprovided a first input for receiving data specifying a web page, thedata including a first rule based grammar statement having a firstphrase portion, a first command portion, and a first tag portion, andincluding a second rule based grammar statement having a second phraseportion, a second command portion, and a second tag portion; a databasefor storing at least a portion of the first rule based grammar statementand for storing at least a portion of the second rule based grammarstatement; a web browser for receiving the data specifying the web page;and an automated speech recognition engine for receiving the firstphrase portion and the first tag portion as a first recognition grammar,and for receiving the second phrase portion and the second tag portionas a second recognition grammar.

BRIEF DESCRIPTION OF THE DRAWINGS

Further objects, features, and advantages of the invention will becomeapparent from the following detailed description taken in conjunctionwith the accompanying figures showing illustrative embodiments of theinvention, in which:

FIG. 1 is a block diagram illustrating a prior art system;

FIG. 2A is a flow chart illustrating a process of web page contentgeneration in accordance with the present invention;

FIG. 2B is a block diagram of a hyper-speech markup language (HSML)enabled web browser system in accordance with the present invention;

FIG. 3A is a flow chart illustrating a thread that monitors messagessent to a particular port of a computer in accordance with the presentinvention;

FIG. 3B is a table illustrating the conditions under which a header,boundary, or footer are appended or prepended to the data received fromthe web server in accordance with the present invention;

FIG. 4 is a flow chart illustrating a thread that receives data from theweb browser and data sent to the web browser associated with web pageswhich were invoked by the web browser in accordance with the presentinvention;

FIG. 5 is a flow chart illustrating a thread which is the method throughwhich the HSML engine reacts to receiving an array of tags from the ASRengine in accordance with the present invention;

FIG. 6 is a flow chart illustrating a thread that receives sub-rule tagsfrom the ASR engine and data sent to the web browser associated with webpages which were invoked by the ASR engine in accordance with thepresent invention;

FIG. 7 is a flow chart illustrating a process for of creating, storingand transmitting rule based grammars and TTS grammars in accordance withthe present invention;

FIG. 8 is a flow chart illustrating a process for parsing block-sets ofhyper-speech markup language grammars in accordance with the presentinvention;

FIG. 9 is a flow chart illustrating a process for parsing blocks ofhyper-speech markup language grammars in accordance with the presentinvention;

FIG. 10 is a flow chart illustrating a process for parsing hyper-speechmarkup language text-to-speech grammars in accordance with the presentinvention;

FIG. 11 is a flow chart illustrating a process for parsing hyper-speechmarkup language rule based grammars in accordance with the presentinvention;

Throughout the figures, unless otherwise stated, the same referencenumerals and characters are used to denote like features, elements,components, or portions of the illustrated embodiments. Moreover, whilethe subject invention will now be described in detail with reference tothe figures, and in connection with the illustrative embodiments,various changes, modifications, alterations and substitutions to thedescribed embodiments will be apparent to those skilled in the artwithout departing from the true scope and spirit of the subjectinvention as defined by the appended claims.

DETAILED DESCRIPTION OF THE INVENTION

FIG. 1 illustrates a prior art system 100 for viewing and interactingwith web pages stored on a web server 102. The web server 102 includinga CPU 104, a network interface 106, and a data storage unit 108 isprovided. The data storage unit 108 contains information describing oneor more web pages. A network connection 109 connects the web server 102to a communications network 110 via the network interface 106 allowingthe web server 102 to communicate with other devices on thecommunications network 110. In a certain embodiment, the communicationsnetwork 110 is the Internet.

A client computer 120 including a CPU 122, a network interface 124, anda data storage unit 126 is provided. In a certain embodiment, the datastorage unit 126 can be memory. A network connection 111 connects theclient computer 120 to the communications network 110 via the networkinterface 124 allowing the client computer 120 to communicate with otherdevices on the communications network 110. The client computer 120 isalso connected to various input/output devices such as a keyboard 128, amouse 136, a microphone 130, a speaker 132, and a display 134 typicallythrough a system bus 138.

A personal digital assistant (PDA) 140, including a CPU 142 and a memory144, is also provided. The PDA 140 is connected to and can communicatewith other devices on the communications network 110 over a networkconnection 148, through a network interface 146.

In order to view a web page a user opens a web browser, stored in a datastorage unit in a computer or a memory in a PDA, for example the datastorage unit 126 in the client computer 120. Once the web browser isopen the user may key in a universal resource locator (URL) which causesthe client computer 120 to issue a request over the communicationsnetwork 110 for the data files describing the contents of the web pageidentified by the URL. A web server, which stores the data files for theweb page identified by the URL, for example the web server 102, receivesthe request and sends the client computer 120 the data files whichdescribe the contents of the web page. The data files may includehyper-text markup language (HTML) files, active server pages files,sound files, video files, etc. The web browser then displays the webpage and plays any video or audio files on the client computer 120 asspecified by the markup language. The markup language specifies whereand when to place or play text, video, or audio content.

FIG. 2A depicts a process 200 for designing a web page. The process 200begins with a web page designer executing a web page design tool at step202. The web page design tool assists the web page designer during theprocess of designing a web page. The web page design tool allows the webpage designer to specify elements of the web page at a high level, thengenerates low level markup language based on the specified elements ofthe web page.

The web page designer begins designing and testing the visual portion ofa web page at step 204. If the visual portion of the web page alreadyexists and does not need to be updated, the process 200 advancesdirectly to step 206. Designing and testing the visual portion of theweb page is well known in the art. After the visual portion of the webpage is designed and tested, the process block 206 is executed.

At step 206, the process of designing and testing an audio portion ofthe web page takes place. The web page designer may add hyper-speechmarkup language (HSML) to the web page markup language to augment thevisual portion of the web page with an audio component. The web pagedesigner may add HSML directly or specify it at a high level using a webpage design tool. If the audio portion of the web page already existsand does not need to be updated, the process 200 advances directly tostep 208. At step 208, the web pages can be made available to usersconnected to the communications network 110. Once the web pages are madeavailable to the public, the process 200 ends.

In an alternate embodiment, the audio portion of the web page could bedesigned before the visual portion of the web page is designed. Inanother alternate embodiment, the audio and visual portions of the webpage could be designed at the same time. In yet another alternateembodiment, the audio and visual portions of the web page could bedesigned in lock step with each other.

In an alternate embodiment, the audio and visual portions of the webpage may be generated by a common gateway interface.

According to the present invention, HSML specifies speech recognitionrule based grammars and TTS grammars for the web page. Speechrecognition rule based grammars allow the recognition of a limited setof predefined speech phrases input to the computer through a microphoneor the like which in turn invoke an event or set of events. TTS grammarsdefine text that can be output as audible content through speakersassociated with the computer displaying a web page.

HSML rule based grammars may be related to, but need not be the same as,the text which appears on the screen. If HSML rule based grammars areadded to the markup language of a web page that is displayed on thescreen, users may speak about the text displayed on the screen ratherthen simply speaking the text, displayed on the screen, itself. A firstexemplary HSML rule based grammar follows: <hsml:JSGFRulehref=“http://www.columbia.edu” tag=“reload” ID=“reload”><! [CDATA[public<reload>=(reload) {reload}]]</hsml:JSGFRule>. The first exemplary HSMLrule based grammar defines a command, “http://www.columbia.edu”, to beexecuted if a speech phrase input to the computer matches the phrase“reload.”

Each HSML rule based grammar has three portions which can be configured:the command portion, the tag portion, and the phrase portion. Thecommand portion defines the command to be executed once the HSML rulebased grammar is invoked. In the first exemplary HSML rule basedgrammar, the command portion is as follows:href=“http://www.columbia.edu”. The command for the first exemplary HSMLrule based grammar is “http://www.columbia.edu”. The tag portion definesa unique way of identifying the particular HSML rule based grammar rule.In the first exemplary HSML rule based grammar, the tag portion is asfollows: tag=“reload”. The tag for the first exemplary HSML rule basedgrammar is “reload”. The phrase portion defines the set of phrases thatwill invoke the HSML rule based grammar. In the first exemplary HSMLrule based grammar, the phrase portion is as follows: “CDATA[public<reload>=(reload) {reload} ”. The phrase for the first exemplary HSMLrule based grammar is “reload”.

Another, more complicated, set of HSML rule based grammars is asfollows: <hsml:block> <hsml:JSGFRulehref=“http://www.columbia.edu/scripts/cgi/test.cgi” tag=“examprep”ID=“examprep”><![CDATA[public <examprep>=[please] (go to | example)(<rad>| <lab>) report {examprep};]]></hsml:JSGFRule> <hsml: JSGFRuletag=“RAD”><![CDATA[public <rad>=radiology {RAD};]]> <hsml: JSGFRule><hsml:JSGFRule tag=“LAB”><![CDATA public <lab>=(laboratory|chemistry){LAB};]]></hsml:JSGFRule></hsml:block>. This grammar provides a nestedset of rules. This HSML rule based grammar defines a set of commands,one of which will be executed if a speech phrase input to the computermatches an associated one of a set of phrases. The top level command isdefined by the HSML rule “<hsml:JSGFRulehref=”http://www.columbia.edu/scripts/cgi/test.cgi” tag=“examprep”ID=“examprep”> <![CDATA[public <examprep>=[please] (go to | example)(<rad>|<lab>) report {examprep};]]></hsml:JSGFRule>”, which defines theset of phrases and the top level command“http://www.columbia.edu/scripts/cgi/test.cgi”.

The set of phrases is defined by the following portion of the phraseportion of the second exemplary HSMIL rule based grammar: [please] (goto | example) (<rad>|<lab>). The phrase starts off with the word“please” within square brackets. A word within square brackets denotesan optional word. The rule will be invoked whether or not the speechphrase begins with the word “please.”

The optional “please” is followed by (go to | example). The parenthesesgroup words together. The words “go to” and “example” are separated byan “|” symbol. The “|” symbol is an “OR” symbol, which indicates thateither “go to” or “example” may be used to invoke the rule. Thereforethe optional word “please” must be followed by “go to” or “example” toinvoke this HSML rule.

The mandatory words “go to” or “example” are followed by (<rad>|<lab>).This portion of the phrase contains variables. The variables are definedin sub-rules elsewhere within the HSML block. The variable <rad> isdefined by the sub-rule <hsml: JSGFRule tag=“RAD”><! [CDATA[public<rad>=radiology {RAD};]]> </hsml: JSGFRule>. Therefore the variable<rad> is defined as radiology. If the word “radiology” follows either“go to” or “example,” a separator, here an “?”, is concatenated onto thetop level command then the sub-rule tag RAD is concatenated onto the toplevel command, such that the command becomes“http://www.columbia.edu/scripts/cgi/test.cgi?RAD”. In an alternateexemplary embodiment, the separator can be a “/”. The variable <lab> isdefined by the sub-rule <hsml:JSGFRule tag=“LAB”> <![CDATA[public<lab>=(laboratory | chemistry) {LAB};]]><hsml:JSGFRule>. The variable<lab> is defined as “laboratory” or “chemistry.” If the word“laboratory” or “chemistry” follows either “go to” or “example,” aseparator, here an “?”, is concatenated onto the top level command thenthe sub-rule tag LAB is concatenated onto the end of the top levelcommand, such that the command becomes“http://www.columbia.edu/scripts/cgi/test.cgi examprep?LAB”.

TTS grammars define text which can be output as synthetic speech throughspeakers. One such HSML TTS grammar is as follows: <hsml:JSML> <![CDATA[reloading]]> </hsml:JSML>. This TTS grammar provides foroutputting the phrase “reloading” as synthetic speech through speakers.The TTS grammar can be invoked when an HSML rule based grammar isinvoked, when a page is loaded, when a web site is loaded, when a pageis exited, or when a web site is exited.

FIG. 2B illustrates a functional block diagram of a system 200 forviewing and interacting with web pages having HSML content. In order toview a web page having HSML content, a user can execute an HSMLprocessor 254, which in turn executes a web browser 256 and an ASRengine 258 on the computer 120. The HSML processor 254 acts as a proxyfor the web browser 256. The web browser 256 should have all cachingdisabled and should support x-mixed replace markup. In an exemplaryembodiment, the web browser 256 can be Netscape Navigator. In thepresent exemplary embodiment the ASR engine 258 can be IBM ViavoiceVersion 7 Millennium. Other ASR engines that conform to the Java SpeechApplication Program Interface standard may also be substituted for IBMViavoice Version 7 Millennium. In a certain embodiment, a script can beused to executed the HSML processor 254 and the web browser 256individually.

In an alternate embodiment, the web browser 256 can be MicrosoftInternet Explorer. In another alternate embodiment, the HSML processor254 communicates with the web browser 256, such that x-mixed replacemarkup is not needed to convey control information.

A visual monitor may also be included with the voice enabled browser.The visual monitor aids in the user's interaction with voice activatedpages. The visual monitor advantageously includes a status bardisplaying an output audio level, an input audio level, a speech enginestatus indicator, a speech synthesis indicator, a rule identifier, adata status indicator, an error indicator, a page grammar active line,an active grammars area, etc. This allows the user to receive feedbackas to how or why his or her speech is or is not recognized. The visualmonitor can be implemented in a window separate from the browser or itcan be integrated as part of the browser itself. This monitor extendsthe browser media environment from a graphical and text contentenvironment to a cross platform, dynamically generated, graphical, textand voice interactive environment

The output audio level informs the user of the setting of the outputvolume. The input audio level indicates the current volume of the user'svoice as the user speaks into the microphone 130 (shown in FIG. 1). Thespeech engine status indicator informs the user as to the status of theHSML processor 254. The HSML processor 254 can be listening, sleeping oroff. If the HSML processor 254 is listening, it is accepting all systemand page commands. If the HSML processor 254 is sleeping, it is waitingfor a specific system command “wake up” to activate the HSML processor254 which will cause the speech engine status indicator to change tolistening. If the HSML processor 254 is off, no speech commands areaccepted. The speech synthesis indicator informs the user whether theHSML processor 254 will react to the HSML TTS grammars associated withthe active web page and the system commands, only the system speechoutput commands, or no speech output commands. The rule identifierdisplays an invoked HSML rule for a short time after the HSML grammarhas been invoked. The data status indicator informs the user whether theHSML processor 254 is loading data, whether the HSML processor 254 isloading speech data, whether the HSML processor 254 has already loadedthe visual portion of the web page, or whether the HSML processor 254has already loaded the web page and speech data. The error indicatorinforms the user of any error that have been detected by the HSMLprocessor 254. The page grammar active line informs the user whether acurrent page grammar exists and is active. The active grammars arealists the HSML grammar rules that are currently active up to a certainmaximum number of HSML grammars. The appearance of these rules in theactive grammars area allows users to easily understand the function andlayout of a page.

The entire sets of rules for large rule sets do not need to be displayedin their entirety. A graphical menu does not need to name every HSMLrule or HSML rule based grammar. The individual HSML rules or the HSMLgrammars can indicate the importance level for each rule or grammar suchthat the visual monitor can intelligently select which rules andgrammars are important and should be displayed. When a rule isdisplayed, the display shows the phrase or phrases that would invoke therule. For example, if the HSML rule based grammar <hsml:block><hsml:JSGFRule href=“http://www.columbia.edu” tag=“reload” ID=“reload”![CDATA[public <reload>=(reload) {reload}]]</hsml:JSGFRule></hsml:block>was displayed, the display would show the phrase “reload”. If more thanone phrase could invoke the HSML rule based grammar, the display showsall the phrases that could invoke the HSML rule based grammar. In analternate embodiment, the display shows the phrases that could invokethe HSML rule based grammar, but does not display optional words. Forexample, if the phrase “please reload” invoked an HSML rule basedgrammar, but the word please was optional, the display would onlydisplay the phrase “reload”, not “please reload” and “reload”.

In an alternate embodiment, the HSML processor 254 can be executedseparately from the web browser 256, and can interact with a web browserthat is already running.

A request for a web page may be initiated by the ASR engine 258 inconjunction with the HSML processor, or the web browser 256. If the ASRengine 258 initiates the request at least one speech recognition rulebased grammar will have been invoked in the ASR engine 258. The ASRengine 258 transmits an array of tags to the HSML processor 254, whichin turn issues a request to the web server 102 (shown in FIG. 1). Thisprocess is shown in more detail in FIGS. 3A, 3B and 5. Referring to FIG.2B, the web browser 256 initiates the request, it transmits the URL tothe HSML processor 254. The HSML processor 254 receives the URL from theweb browser 256 and issues a request to the web server 102 (shown inFIG. 1). This process is shown in more detail in FIGS. 3A, 3B and 4.

The computer 120 (shown in FIG. 1) receives a response to the requestissued by the HSML processor 254 in the form of data representing theweb page corresponding to the URL. The computer 120 communicates thedata, including markup data, representing the web page corresponding tothe URL to the HSML processor 254. The HSML processor 254 processes themarkup data and creates, stores and transmits to the ASR engine 258 therule based grammars and TTS grammars based on the markup data. Theprocess of creating, storing and sending rule based grammars and ITSgrammars is described in more detail below in reference to FIGS. 3A and3B.

In an alternate embodiment, the functionality of the HSML processor 254may be incorporated into the web browser 256. In another alternateembodiment, the functionality of the HSML processor 254 may beincorporated into the web browser 256 as a module or subroutines. In yetanother alternate embodiment, the functionality of the ASR engine 258may be incorporated into the web browser 256.

FIG. 3A illustrates a thread 300 that monitors messages sent to aparticular port of a computer. A thread is not an independent process,but is a thread of execution that belongs to a particular process; herethe thread 300 belongs to the HSML processor 254. A single process mayhave multiple threads. The thread 300 monitors the particular port for anew connection from either the web browser 256 or a thread 400 of theHSML processor 254, shown in FIG. 5. Once the thread 300 receives thenew connection from either the web browser 256 or the thread 500, thethread 300 determines whether the new connection includes a data stream.If the new connection includes a data stream, the thread 300 reads theserver address from the data stream and issues a request to the webserver 102 (shown in FIG. 1). After the request is issued, the thread300 waits for the data describing the web page from the web server. Ifthe thread 300 receives data from the web server, the thread 300processes the data extracting any HSML grammars and transmitting thedata to the web browser 256.

The thread 300 begins at step 302 where the thread 300 initializes arecognizer, a temporary variable PREV_VOICE and a temporary variableCUR_VOICE. The thread 300 uses a class library called InternationalBusiness Machines Java Speech Library (IBMJS Library), version 1.0,which is an implementation of the Java Speech Application ProgramInterface standard, to interface with the ASR engine 258. In analternate embodiment, the thread 300 may cause a class library otherthan IBMJS to communicate with the ASR engine. Other class librariesthat conform to the Java Speech Application Program Interface standardmay also be substituted for the IBMJS Library. The thread 300initializes the recognizer using the method createrecognizer of a classused to implement the IBMJS Library interface Recognizer. After therecognizer is initialized, the temporary variable PREV_VOICE isinitialized to FALSE, and the temporary variable CUR_VOICE isinitialized to FALSE. Once the temporary variables are initialized thethread 300 advances to step 304.

At step 304, the thread 300 waits for a new connection from the webbrowser 256 (shown in FIG. 2B) or a thread 500 (shown in FIG. 5). Thethread 500 is described in greater detail in relation to FIG. 5. The newconnection received by the thread 300 will be a data stream. Once thethread 300 receives a new connection from the web browser 256 or thethread 500, the thread 300 advances to step 306.

At step 306, the thread 300 determines if it has received any data fromthe new connection. If the thread 300 has received data, the thread 300advances to step 308. If the thread 300 has not received data, thethread 300 advances to step 304.

At step 308, the thread 300 sets a temporary variable DNSTR_COMP to anappropriate value. The temporary variable DNSTR_COMP should be set equala value representing the socket associated with the web browser 256 orthe thread 500 that initiated the new connection with the thread 300.Having the value of the socket associated with the initiator of the newconnection with the thread 300 allows the thread 300 to transmit data toand read data from the initiator of the new connection. Once thetemporary variable DNSTR_COMP is set, the thread 300 advances to step312.

At step 312, the thread 300 disables the current rule based grammars andTTS grammars within the ASR engine 258 (shown in FIG. 2B). Rule basedgrammars that are associated with any web page and specified in the ASRengine 258 are disabled using a method deleteRuleGrammar of the instancerecognizer of a class implementing the IBMJS Library interfaceRecognizer. Once the existing grammars are disabled, the thread 300advances to step 314.

At step 314, the server address is read from the data stream received bythe thread 300 during the new connection. After the server address isread from the data stream, the thread 300 formats and issues aconnection request to the web server 102 (shown in FIG. 1) specifyingthe server address Once the request is sent to the web server 102, thethread 300 advances to step 316.

At step 316, the thread 300 begins the parallel execution of a separatethread 400 of the HSML processor 254. The thread 400 is described ingreater detail with reference to FIG. 4. As the thread 300 begins theexecution of the thread 400, the thread 300 passes the server name readfrom the data stream to the thread 400. Once the server name is passedto the thread 400, the thread 300 advances to step 318.

At step 318, the thread 300 determines whether the connection to the webserver 102 (shown in FIG. 1) is open. In order to read from the webserver 102 the connection has to be open. The thread 300 performs a readfrom the web server 102. If an error is returned, the connection isclosed; otherwise the connection is open. If the connection to the webserver is open, the thread 300 advances to step 320. If the connectionto the web server is not open, the thread 300 advances to step 304.

At step 320, the thread 300 performs a read from the web server 102(shown in FIG. 1). The thread 300 receives the data describing therequested web page during the read from the web server 102. Once thethread 300 performs the read from the web server 102, the thread 300advances to step 322.

At step 322, the thread 300 appends or prepends the appropriate header,boundary, or footer to the data read from the web server 102 and sendsthe information including an appropriate header, boundary or footer tothe socket defined by the temporary variable DNSTR_COMP. The conditionsunder which the thread 300 appends or prepends a header, boundary, orfooter are graphically illustrated in a table 350 illustrated in FIG.3B. Referring to FIG. 3B, the table 350 includes a group of columns: acurrent page column 352, a previous page column 354, a current pageinitiation column 356, a first directive column 358, a second directivecolumn 360, and an append/prepend column 361. The table 350 alsoincludes a group of entries: a first entry 362, a second entry 364, athird entry 366, a fourth entry 368, a fifth entry 370, a sixth entry372, a seventh entry 374, and an eighth entry 376.

The entries in the current page column 352 can be either a Yes or a No.Having a Yes in the current page column 352 indicates that the currentpage includes at least one HSML grammar. Having a No in the current pagecolumn 352 indicates that the current page does not include any HSMLgrammars.

The entries in the previous page column 354 can be either a Yes or a No.Having a Yes in the previous page column 354 indicates that the previouspage includes at least one HSML grammar. Having a No in the previouspage column 354 indicates that the previous page does not include anyHSML grammars.

The entries in the current page initiation column 356 can be eitherVoice or Mouse Click. Having Voice in the current page initiation column356 indicates that the current page was initiated by a grammar rule.Having Mouse Click in the current page initiation column 356 indicatesthat the current page was initiated by a mouse click.

The entries in the first directive column 358 and the second directivecolumn 360 can have values equal to Header, Boundary, Footer, CloseSocket, and Nothing. Each value stored in the first directive column 358and the second directive column 360 indicate a particular directive thatthe thread 300 must transmit to the downstream component specified bythe temporary variable DNSTR_COMP. The directive specified in the firstdirective column 358 should be transmitted before the directivespecified in the second directive column 360. For example, the firstentry 362 indicates that if the current page has a grammar, the previouspage has a grammar and the current page was initiated by voice, aboundary then nothing should be appended to the data before it istransmitted to the downstream component specified by the temporaryvariable DNSTR_COMP.

A header, as specified in the first directive column 358 of the secondentry 364, the third entry 366, and the fourth entry 368 is a hyper-texttransfer protocol (HTTP) header. The HTTP header should include aninitial boundary. The HTTP header defines the start of x-mixed replacedata. An exemplary HTTP header follows:

HTTP/1.1 200

-   -   Content-Type: multipart/x-mixed-replace;boundary=123456789    -   Cache-Control: no-cache    -   −123456789

A boundary, as specified in the first directive column 358 of the firstentry 362, and in the second directive column 360 of the second entry364, the third entry 366, and the fourth entry 368, is an HTTP boundary.The HTTP boundary defines the end of a given section of x-mixed replacedata. An exemplary HTTP boundary follows:

-   -   −123456789

A footer, as specified in the first directive column 358 of the fifthentry 370, is an HTTP footer. The HTTP footer defines the end of allx-mixed replace data. An exemplary HTTP footer follows:

-   -   −123456789—

The exemplary HTTP header may define the beginning of x-mixed replacedata, the exemplary HTTP boundary may define the end of a portion of thex-mixed replace data, and the exemplary HTTP footer may define the endof all x-mixed replace data The exemplary HTTP header, the exemplaryHTTP boundary, and the exemplary HTTP footer are associated with eachother by the number 123456789, which appears in the second line of theexemplary HTTP header, in the exemplary HTTP boundary, and in theexemplary HTTP footer. The web browser 256 renders the display 134(shown in FIG. 1) attached to the computer 120. (shown in FIG. 1)according to the data describing the web site, which appears after theexemplary HTTP header. If the web browser 256 receives new data from theHSML processor 254, the web browser 256 renders the display 134according to the new data. If the web browser 256 receives an HTTPboundary, the web browser 256 “wipes” the display 134 attached to thecomputer 120 upon receipt of new data, and repaints the display 134according to the data which follows the HTTP boundary. The web browsermay also close its socket once it receives an HTTP footer. If the webbrowser 256 receives an indication that new information needs to beobtained, for example, if a user clicks on a hyper-link, the web browser256 transmits a URL to the HSML processor 254 (shown in FIG. 2B), and itcloses the old socket before opening a new connection.

A close socket, as specified in the second directive column 360 of thefifth entry 370, closes the socket to the web browser. The HSMLprocessor 254 may not contact the web browser 256 once the socket isclosed. The socket is closed by using a method close of the instancesocket of the class Socket of the JAVA 2 Platform, Standard Edition,v1.2.2.

The entries in the append/prepend column 361 can have values equal to“before” and “after”. Each value stored in the append/prepend column 361indicate whether the associated first directive should be appended orprepended to the data being sent to the downstream component specifiedby the temporary variable DNSTR_COMP. If the entry in the append/prependcolumn 361 is “after”, the first directive should be appended to thedata received from the web server 102 (shown in FIG. 1), if however, theentry in the append/prepend column 361 is “before”, the first directiveshould be prepended to the data received from the web server 102. Thesecond directive should always be appended to the data received from theweb server 102 (shown in FIG. 1).

The thread 300 determines whether the data received by the thread 300during the new connection includes any HSML grammars. The thread 300reads data from the data stream, without removing any data from the datastream, looking for an HSML grammar. After the thread 300 has reviewedthe data received from the web server 102, the thread 300 appends orprepends the directives specified in the table 350 onto the datareceived from the web server 102 depending on whether the previous pagecontained an HSML grammar, the current page contained an HSML grammar,and how the current page was initiated. Once the directives are appendedor prepended, the thread 300 advances to step 324.

At step 324, the thread 300 determines whether the temporary variableCUR_VOICE is equal to TRUE. If the temporary variable CUR_VOICE is equalto TRUE, the thread 300 advances to step 328. If the temporary variableCUR_VOICE is not equal to TRUE, the thread 300 advances to step 326.

At step 326, the thread 300 sets the temporary variable PREV_VOICE equalto FALSE. Once the temporary variable PREV_VOICE is set equal to FALSE,the thread 300 advances to step 330.

At step 328, the thread 300 sets the temporary variable PREV_VOICE equalto RLUE. Once the temporary variable PREV_VOICE is set equal to TRUE,the thread 300 advances to step 330.

At step 330, the thread 300 sets the temporary variable CUR_VOICE equalto FALSE. Once the temporary variable CUR_VOICE is set equal to FALSE,the thread 300 advances to step 332.

At step 332, the thread 300 determines whether the data stream, receivedby the thread 300 during the new connection, includes any HSML grammars.The thread 300 reads data from the data stream, without removing anydata from the data stream, looking for an HSML grammar. If the thread300 finds the beginning of any HSML grammar, HSML grammars are presentin the data stream, and the thread 300 advances to step 334. The step334 is shown in more detail in FIG. 7. Once the step 334 is completed,the thread 300 advances to step 304. If the thread 300 does not find thebeginning of any HSML grammar, there are no HSML grammars in the datastream, and the thread 300 advances to step 304.

FIG. 4 illustrates the thread 400 which begins parallel execution withthread 300 at step 316. Thread 400 receives data from the web browser256 (shown in FIG. 2B). The thread 400 begins at step 401 by sending arequest to the server address including the URL and any parameters. Oncethe request is sent to the server, the thread 400 advances to step 402.

At step 402, the thread 400 performs a read from the web browser 256. Byreading from the web browser, the thread 400 can determine if anyadditional arguments are specified by the web browser 256. Once thethread 400 performs a read from the web browser 256, the thread 400advances to step 404.

At step 404, the thread 400 determines whether an interrupt has beenreceived by the thread 400 from thread 500, which is explained ingreater detail in connection with FIG. 5. If an interrupt has beenreceived, the thread 400 advances to step 406. If an interrupt has notbeen received, the thread 400 advances to step 410.

At step 406, the thread 400 closes its connection with the web server102. Once the thread 400 closes its connection with the web server 102,the thread 400 advances to step 408.

At step 408, the thread 400 ceases further communication with the webbrowser 256 (shown in FIG. 2B). The thread 400 does not close theconnection with the web browser 256. The connection with the web browser256 remains open and waits for another contact from the HSML processor254 (shown in FIG. 2B). Once the thread 400 ceases communication withthe web browser 256, the thread 400 exits.

At step 410, the thread 400 determines whether additional data has beenreceived from the web browser 256. If data has been read from the webbrowser 256, the thread 400 advances to step 412. If no data wasreceived from the web browser 256, the thread 400 advances to step 402.

At step 412, the thread 400 transmits the additional arguments receivedfrom the web browser 256 to the web server 102 (shown in FIG. 1). Theadditional arguments should be sent to the web server 102 as argumentsassociated with the server name that was sent to the web server by thethread 300. After the thread 400 transmits the arguments received fromthe web browser 256 to the web server 102, the thread 400 advances tostep 402.

FIG. 7 illustrates the procedure 334 for creating, storing and sendingrule based grammars and TTS grammars in greater detail. The procedure334 in conjunction with its sub-procedures parse the markup language ofa web page extracting the HSML grammars which define the rule basedgrammars and the TTS grammars for that web page, which store the rulebased grammars and TTS grammars, and which transmit the rule basedgrammars to the ASR engine 258. The procedure 334 itself, parses themarkup language of the web page searching for an HSML blockset. The HSMLblockset is the top level statement of any HSML grammar.

The procedure 334 begins at step 702 by initializing a temporaryvariable SPEECH, a temporary variable HREF, and a temporary variableASRCMDS to null. Once the temporary variables SPEECH, HREF, and ASRCMDSare initialized the procedure 334 advances to step 704.

At step 704 the procedure 334 begins reading the markup languageincluded in the data describing the web page. If the procedure 334 readsa string “<hsml:blockset>”, the procedure 334 advances to step 706. Thestep 706 is shown in more detail in FIG. 8. Once the step 706 hascompleted, the procedure 334 advances to step 704. If the procedure 334does not read a string “<hsml:blockset>”, the procedure 334 advances tostep 708.

At step 708, the procedure 334 compares what the procedure 334 has readto EOF or end of file. BOF designates when a file has come to an end. Ifthe procedure 334 has read the EOF character, the procedure 334 advancesto step 710. If the procedure 334 has not read the EOF character, theprocedure 334 advances to step 704.

At step 710, the procedure 334 transmits the rule based grammars to theASR engine 258 (shown in FIG. 2B). The process 700 uses a class librarycalled International Business Machines Java Speech Library (IMBJSLibrary), version 1.0, to interface with the ASR engine 258. Theprocedure 334 then initializes a reader, which is a JAVA superclass, asa stringreader. The stringreader is a JAVA class and a member of theJAVA superclass reader. The reader is provided with the rule basedgrammars that shall be supplied to the ASR engine 258. The rule basedgrammars are stored in the temporary variable ASRCMDS. Once the rulebased grammars are stored, the procedure 334 initializes an instancergGram of the class Rule Grammar, which is a IBMJS class. The instancergGram is initialized as equal to the output of the method loadJSGFwhich is a method of the instance recognizer of a class which implementsthe IBMJS Library interface Recognizer. The temporary variable ASRCMDSis provided as an argument to the method loadJSGF. The method loadJSGFtransmits the rule based grammars to the ASR engine 258. Once the rulebased grammars are sent to the ASR engine 258, the procedure 334advances to the step 712.

At step 712, the procedure 334 specifies a result listener whichreceives output from the ASR engine 258. The result listener is bound byusing the method addResultListener, which is a method of the instancergGram of a class which implements the IBMJS Library interface RuleGrammar. Once the result listener is initialized, the procedure 334utilizes the method commitChanges of the class which implements theIBMJS Library interface Recognizer to complete the changes specified tothe ASR engine 258 and exits.

FIG. 8 illustrates the procedure 706 for parsing block-sets of HSMLgrammars in greater detail. The procedure 706 begins at step 802 byreading the markup language included in the data describing the webpage. If the procedure 706 reads a string “<hsml:JSML>”, the procedure706 advances to step 804. The step 804 is shown in more detail in FIG.10. Once the step 804 has completed, the procedure 706 advances to step802. If the procedure 706 does not read a string “<hsml:JSML>”, theprocedure 706 advances to step 806.

At the step 806, the procedure 706 compares what the procedure 706 hasread to the string “<hsml:JSGFRule>”. If the procedure 706 has read thestring “<hsml:JSGFRule>”, the procedure 706 advances to step 808. Thestep 808 is shown in more detail in FIG. 11. Once the step 808 hascompleted, the procedure 706 advances to step 802. If the procedure 706has not read the string “<hsml:JSGFRule>”, the procedure 706 advances tostep 810.

At the step 810, the procedure 706 compares what the procedure 706 hasread to the string “<hsml:block>”. If the procedure 706 has read thestring “<hsml:block>”, the procedure 706 advances to step 812. The step812 is shown in more detail in FIG. 9. Once the step 812 is completed,the procedure 706 advances to step 802. If the procedure 706 has notread the string “<hsml:block>”, the procedure 706 advances to step 814.

At the step 814, the procedure 706 compares what the procedure 706 hasread to the string “<hsml:blockset>”. If the procedure 706 has read thestring “</hsml:blockset>”, the procedure 706 is complete and theprocedure 706 exits. If the procedure 706 has not read the string“</hsml:blockset>”, the procedure 706 advances to step 816.

At the step 816 the procedure 706 reports an error in the markuplanguage of the data describing the web page. If the string“</hsml:blockset>” does not appear at this point, an error should bereported Once the error is reported, the procedure 706 exits, and inturn the procedure 334 exits.

FIG. 9 illustrates the procedure 812 for parsing blocks of HSML grammarsin greater detail. The procedure 812 begins at step 901 where theprocedure 812 initializes the temporary variable VOICE_GRAM to FALSE.Once the temporary variable VOICE_GRAM is initialized, the procedure 812advances to step 902.

At step 902, the procedure 812 reads the markup language included in thedata describing the web page. If the procedure 812 reads a string“<hsml:JSML>”, the procedure 812 advances to step 804. The step 804 isshown in more detail in FIG. 10. Once the step 804 is completed, theprocedure 812 advances to step 902. If the procedure 812 does not readthe string “<hsml:JSML>”, the procedure 812 advances to step 904.

At step 904, the procedure 812 continues to read the markup dataincluded with the data. If the procedure 812 has read a string matchingthe string “<hsml:JSGFRule>”, the procedure 812 advances to step 905. Ifthe procedure 812 has not read a string matching the string“<hsml:JSGFRule>”, the procedure 812 advances to step 906.

At step 905, the procedure 812 sets the temporary variable VOICE_GRAMequal to TRUE to signify that a rule based grammar is present in theHSML block. Once the temporary variable VOICE_GRAM is set, the procedure812 advances to step 808. The step 808 is shown in more detail in FIG.10. Once the step 808 is completed, the procedure 812 advances to step902.

At step 906, the procedure 812 compares what the procedure 812 has readto the string “<hsml:block>”. If the procedure 812 has read a stringmatching the string “</hsml:block>”, the procedure 812 advances to step910. If the procedure 812 has not read a string matching the string“</hsml:block>”, the procedure 812 advances to step 908.

At the step 908 the procedure 812 reports an error in the markuplanguage of the data describing the web page. If the string“<hsml:block>” does not appear at this point, an error in the markuplanguage should be reported. Once the error is reported, the procedure812 exits, and in turn the processes 334 and 706 exit.

At step 910, the procedure 812 determines whether the temporary variableVOICE_GRAM is equal to TRUE. If the temporary variable VOICE_GRAM isequal to FALSE, the procedure 812 advances to step 912. If the temporaryvariable VOICE_GRAM is not equal to FALSE, the procedure 812 exits.

At step 912, the procedure 812 transmits a Java speech grammar formatrules (JSGF rules) corresponding to the string stored in the temporaryvariable SPEECH to the ASR engine 258. The string stored in thetemporary variable SPEECH is provided to a method speak of the instancesynthesizer of a class which implements the IBMJS Library interfaceSynthesizer. Once the JSGF rule is transmitted to the ASR engine 258(shown in FIG. 2B), the procedure 812 exits.

FIG. 10 illustrates the procedure 804 for parsing HSML TTS grammars ingreater detail. The procedure 804 begins at step 1002 by reading themarkup language included in the data describing the web page. If theprocedure 804 reads a string “<!CDATA[”, the procedure 804 advances tostep 1006. If the procedure 804 does not read a string “<!CDATA[”, theprocedure 804 advances to step 1004.

At the step 1004 the procedure 804 reports an error in the markuplanguage of the data describing the web page. If the string“<hsml:JSML>” is not followed by the string “<!CDATA[” an error shouldbe reported. Once the error is reported, the procedure 804 exits, and inturn the processes 334 and 706 exit, and, if running, the procedure 812exits.

At step 1006, the procedure 804 captures the text of the markup languageincluded in the data describing the web page until the procedure 804reads a “]”. The captured markup language is stored in the temporaryvariable SPEECH as a string. Once captured markup language is stored inthe temporary variable SPEECH, the procedure 804 advances to step 1008.

At step 1008, the procedure 804 continues to read the markup languageincluded with the data describing the web site. If the procedure 804 hasread a string matching the string “]></hsml:JSML>”, the procedure 804 iscomplete and the procedure 804 exits. If the procedure 804 has not reada string matching the string “]><hsml:JSML>”, the procedure 804 advancesto step 1010.

At the step 1010 the procedure 804 reports an error in the markuplanguage of the data describing the web page. If the string “]” is notfollowed by the string “]></hsml:JSML>” an error should be reported.Once the error is reported, the procedure 804 exits, and in turn theprocesses 334 and 706 exit, and, if running, the procedure 812 exits.

FIG. 11 illustrates the procedure 808 for parsing HSML rule basedgrammars in greater detail. The procedure 808 begins at step 1102 byreading the markup language included in the data describing the webpage. If the procedure 808 reads a string “href=”, the procedure 808advances to step 1104. If the procedure 808 does not read a string“href=”, the procedure 808 advances to step 1106.

At step 1104, the procedure 808 captures the text of the markup languageincluded in the data describing the web page until the procedure 808reads an entire string bounded by double quotes, for example, in thefirst example HSML rule based grammar, “www.columbia.edu”. The capturedmarkup language included in the data describing the web page is storedin the temporary variable HREF as a string. Once the captured markuplanguage is stored in the temporary variable HREF, the procedure 808advances to step 1106.

At step 1106, the procedure 808 continues to read the markup languageincluded with the data describing the web site. If the procedure 808 hasread a string matching the string “tag=”, the procedure 808 advances tostep 1110. If the procedure 808 has not read a string matching thestring “tag=”, the procedure 808 advances to step 1108.

At the step 1108 the procedure 808 reports an error in the markuplanguage of the data describing the web page. If the string “tag=” isnot present in the markup language at this point an error should bereported. Once the error is reported, the procedure 808 exits, and inturn the processes 334 and 706 exit, and, if running, the procedure 812exits.

At step 1110, the procedure 808 captures the text of the markup languageincluded in the data describing the web page until the procedure 808reads an entire string bounded by double quotes, for example, in thefirst example HSML rule based grammar, “reload”. The captured markuplanguage included in the data describing the web page is stored in thetemporary variable TAG as a string. After the captured markup languageis stored in the temporary variable TAG, the procedure 808 creates a newrecord in a database located on the data storage unit 126 (shown in FIG.1). The new record includes a first field, a second field and a thirdfield. The first field is populated with the string stored in thetemporary variable TAG, the second field is populated with the string“HREF”, and the third field is populated with the string stored in thetemporary variable HREF. After the first database record is populated,the procedure 808 advances to step 1112.

In an alternate embodiment, the string “tag=” and associated tag may beeliminated from the HSML rule, but there should be at least one string“tag=” and associated tag per grammar.

At step 1112, the procedure 808 continues to read the markup languageincluded with the data describing the web site. If the procedure 808 hasread a string matching the string “><![CDATA[”, the procedure 808advances to step 1116. If the procedure 808 has not read a stringmatching the string “><![CDATA[”, the procedure 808 advances to step1114.

At the step 1114 the procedure 808 reports an error in the markuplanguage of the data describing the web page. If the string “><![CDATA[”is not present in the markup language at this point an error should bereported. Once the error is reported, the procedure 808 exits, and inturn the processes 334 and 706 exit, and, if running, the procedure 812exits.

At step 1116, the procedure 808 captures the text of the markup languageincluded in the data describing the web page until the procedure 808reads “]”. The captured markup language included in the data describingthe web page is stored in the temporary variable ASRCMDS as text, and acarriage return character is appended to the end of the captured text.Once the text is stored in the temporary variable ASRCMDS, the procedure808 advances to step 1118.

At step 1118, the procedure 808 continues to read the markup languageincluded with the data describing the web site. If the procedure 808 hasread a string matching the string “]]></hsml:JSGFRule>”, the procedure808 advances to step 1122. If the procedure 808 has not read a stringmatching the string “]]></hsml:JSGFRule>”, the procedure 808 advances tostep 1120.

At the step 1120 the procedure 808 reports an error in the markuplanguage of the data describing the web page. If the string“]]><hsml:JSGFRule>” is not present in the markup language at this pointan error should be reported. Once the error is reported, the procedure808 exits, and in turn the processes 334 and 706 exit, and, if running,the procedure 812 exits.

At step 1122, the procedure 808 compares the temporary variable SPEECHto NULL. If the temporary variable SPEECH is equal to NULL, theprocedure 808 advances to step 1126. If the temporary variable SPEECH isnot equal to NULL, the procedure 808 advances to the step 1124.

At step 1124, the procedure 808 creates another new record in thedatabase located on the data storage unit 126 (shown in FIG. 1). Thisnew record includes a first field, a second field, and a third field.The first field is populated with the string stored in the temporaryvariable TAG, the second field is populated with the string “SPEECH”,and the third field is populated with the string stored in the temporaryvariable SPEECH. After this new record is populated, the procedure 808advances to step 1126.

At step 1126, the procedure 808 resets some of the temporary variables.The temporary variables SPEECH, TAG and HREF are set equal to NULL. Oncethese temporary variables are set equal to NULL, the procedure 808 iscomplete and it exits.

FIG. 5 illustrates the thread 500 that receives arrays of tags from theASR engine 258 (shown in FIG. 2B). More particularly, the thread 500receives arrays of tags from the ASR engine 258, constructs full URLsbased upon the arrays of tags received from the ASR engine 258, andtransmits the full URLs to the web server. Each tag of an arraycorresponds to a grammar rule that has been invoked within the ASRengine 258.

The thread 500 begins at step 502 by determining whether any systemcommands exist within the array of tags received from the ASR engine258. If any system commands have been received, the thread 500 advancesto step 504. If no system commands have been received, the thread 500advances to step 506.

At step 504, the thread 500 executes any system commands received by thethread 500. System commands are system level commands that are invokedby rule based grammars stored in the ASR engine 258, for example, exitThe rule based grammars, associated with system commands are alwaysactive in the ASR engine 258 no matter which web page is currentlyactive. In the case of the system command exit, the ASR engine 258 willattempt to match any utterance to the string “exit”, regardless of whichweb page is currently being viewed. If the rule based grammar associatedwith the string “exit” is invoked, the ASR engine 258 issues an array oftags, one of which is associated with the system command exit, and thethread 500 executes the system commands. Once the system commands havebeen executed, the thread 500 advances to step 502.

At step 506, the thread 500 searches the database located on the datastorage device 126 (shown in FIG. 1) for database records with a firsttag of the list of tags stored in the first field of the databaserecords. If the database contains no records where the first field ofthe database record matches the first tag of the array of tags, or morethan two records where the first field of the database records matchesthe first tag of the array of tags, the database contains aninappropriate number of database records and the thread 500 advances tostep 508. If the database contains one or two records where the firstfield of the database records matches the first tag of the array oftags, the database contains an appropriate number of database recordsand the thread 500 advances to step 510.

At the step 508 the thread 500 reports an error in the markup languageof the data describing the web page. If the database does not containany records corresponding to the first tag of the array of tags or thedatabase contains more than two records corresponding to the first tagof the array of tags, an error should be reported. Once the error isreported, the thread 500 exits, and in turn the thread 300 exits.

At step 510, the thread 500 searches the database located on the datastorage device 126 (shown in FIG. 1) for a database record with a stringmatching the first tag of the array of tags stored in the first field ofthe database record and the string “SPEECH” stored in the second fieldof the database record. If the thread 500 finds such a database record,the thread 500 advances to step 512. If the thread 500 does not findsuch a database record, the thread 500 advances to step 514.

At step 512, the thread 500 transmits a JSGF rule corresponding to thestring stored in the third field of the of the database record foundduring the step 510 to the ASR engine 258 (shown in FIG. 2B). The stringstored in the third field of the database record found during the step510 is provided to a method speak of the instance synthesizer of a classwhich implements the IBMJS Library interface Synthesizer. Once the JSGFrule is transmitted to the ASR engine 258, the thread 500 advances tostep 514.

At step 514, the thread 500 determines whether the temporary variablePREV_VOICE is set equal to TRUE. If the temporary variable PREV_VOICE isset equal to TRUE, the thread 500 advances to step 518. If the temporaryvariable PREV_VOICE is not set equal to TRUE, the thread 500 advances tostep 516.

At step 516, the thread 500 assigns the value of a temporary variableDNSTR_COMP_(—)2 equal to the value of the temporary variable DNSTR_COMP.The temporary variable DNSTR_COMP_(—)2 now represents the socket of thedownstream component, which will be used by a thread 600, described ingreater detail in relation to FIG. 6, to transmit data to the webbrowser. Once the value of the temporary variable DNSTR_COMP_(—)2 isupdated, the thread 500 advances to step 518.

At step 518, the thread 500 sets the temporary variable CUR_VOICE equalto TRUE Once the temporary variable CUR_VOICE is set equal to TRUE, thethread 500 advances to step 520.

At step 520, the thread 500 determines whether the thread 400 iscurrently running. If the thread 400 is currently running, the thread500 advances to step 522. If the thread 400 is not currently running,the thread 500 advances to step 524.

At step 522, the thread 500 transmits an interrupt to thread 400. Theinterrupt will cause the thread 400 to exit without closing itsconnection with the web browser 256. Once the interrupt is sent to thethread 400, the thread 500 advances to step 524.

At step 524, the thread 500 searches the database located on the datastorage device 126 (shown in FIG. 1) for a database record with thefirst tag of the array of tags stored in the first field of the databaserecord and the string “HREF” stored in the second field of the databaserecord. If the thread 500 finds such a database record, the thread 500stores the string stored in the third field of the database record in atemporary variable and appends any tags received from the ASR engine 258onto the string stored in the temporary variable. Once the tags areappended, the thread 500 transmits the string stored in the temporaryvariable to the thread 300. After the string is sent to thread 300, thethread 500 advances to step 526.

At step 526, the thread 500 causes the thread 600 to begin execution.Once the thread 600 begins execution, the thread 500 exits.

FIG. 6 illustrates a thread 600 that receives and transmits data sent tothe web browser 256 (shown in FIG. 2B) associated with web pages whichwere invoked by the ASR engine 258. The thread 600 begins at step 602 bydetermining whether a data stream has been received by the thread 600.If a data stream has been received, the thread 600 advances to step 604.If a data stream has not been received, the thread 600 advances to step608.

At step 604, the thread 600 must transmit the data contained within thedata stream received from thread 300 to the socket specified in thetemporary variable DNSTR_OMP_(—)2. Once the data received by the thread600 is transmitted to the web browser 256, the thread 600 advances tostep 606.

At step 606, the thread 600 determines whether the browser socketspecified by the temporary variable DNSTR_COMP_(—)2 is closed. If thethread 600 receives an error when it attempts to transmit data to thevalue of the browser socket variable at step 604, the browser socket isclosed and the thread 600 must exit. If the thread 600 does not receivean error, the browser socket is open and the thread 600 advances to step602.

At step 608, the thread 600 ceases further communication with the webbrowser 256. The thread 600 does not close the connection with the webbrowser 256. The connection with the web browser 256 remains open andwaits for contact from another process. Once the thread 600 ceasescommunication with the web browser 256, the thread 600 exits.

In an alternate embodiment, the HSML processor 254 (shown in FIG. 2B)may receive data describing a web page with multiple frames includedwithin that web page. Each frame may include a different set of HSMLrule based grammars and HSML TTS grammars. The HSML processor 254 can beconfigured to monitor multiple sets of HSML rule based grammars and HSMLTTS grammars.

1. A computer-implemented method for providing a visual web page havingan audio interface, comprising one or more computer processorsperforming the following: receiving, at a browser, a first web pagetransmitted from a server, the first web page including a page mark-upthat includes both first components parseable into a first visual webpage and second components parseable into a first rule based grammargenerated irrespective of content of the first components, the secondcomponents that are parseable into the first rule based grammar beingspecified by a speech markup language; traversing the first web page;during the traversal of the first web page, identifying which of thecomponents of the first web page are the first components parseable intothe first visual web page and which of the components of the first webpage are the second components parseable into the first rule basedgrammar; parsing the first components into a display of the first visualweb page; parsing the second components into the first rule basedgrammar; and loading the first rule based grammar into memory inassociation with the first visual web page.
 2. The method of claim 1,wherein each rule of the first rule based grammar includes a firstcommand portion identifying a command to be performed when therespective rule of the first rule based grammar is invoked, a firstphrase portion identifying at least one phrase which invokes therespective rule of the first rule based grammar, and a first tag portionwhich uniquely identifies the respective rule of the first rule basedgrammar.
 3. The method of claim 2, wherein: the page mark-up of thefirst web page further includes third components parseable into a speechgrammar which includes rules for outputting audio; each rule of thespeech grammar is associated with a tag portion; and a particular ruleof the speech grammar is loaded into the memory in association with aparticular rule of the first rule based grammar by use of a same tagportion by the particular rules of the speech grammar and the first rulebased grammar.
 4. The method of claim 3, further comprising: responsiveto receipt, while the first visual web page is active, of a voicecommand matching the first phrase portion of the particular rule of thefirst rule based grammar, executing the command of the particular ruleof the first rule based grammar and outputting the audio of theparticular rule of the speech grammar.
 5. The method of claim 2, whereinthe first phrase portion of each of at least one rule of the first rulebased grammar identifies a respective plurality of phrases which invokethe respective rule, at least one of the respective plurality of phrasesbeing different from any text of the first visual web page viainteraction with which the respective rule is invocable.
 6. The methodof claim 5, further comprising: displaying a list of the respectiveplurality of phrases.
 7. The method of claim 1, further comprising:determining whether to add one or more directives to data of thereceived first web page, and, where it is determined that one or moredirectives are to be added, whether the one or more directives are to beappended or prepended, wherein the determinations are based on whether acurrent received web page includes a grammar, on whether a previouslyreceived web page includes a grammar, and on a method via which thecurrent page was initiated.
 8. The method of claim 1, wherein the secondcomponents include a component that provides a speech-invoked reloadinstruction for reloading the first visual web page.
 9. The method ofclaim 1, wherein each of at least a subset of rules of the first rulebased grammar includes a start tag, an association of a Uniform ResourceLocator, at least one speech element for invoking a command, and an endtag.
 10. The method of claim 1, further comprising: opening a connectionto the browser from the server; and sending the first text content tothe browser over the connection.
 11. The method of claim 10, wherein theconnection is a persistent connection.
 12. The method of claim 11,further comprising: parsing a received voice command in accordance withthe first rule based grammar; requesting a second web page from theserver responsive to the voice command; receiving a second data from theserver, the second data specifying the second web page; parsing thesecond data into a second text content and a second rule based grammar;loading the second rule based grammar; and sending the second textcontent to the browser over the connection.
 13. The method of claim 12,wherein the voice command is an instruction to the server.
 14. Themethod of claim 1, wherein the first text content includes markuplanguage.
 15. The method of claim 14, wherein the markup language isHypertext Markup Language.
 16. The method of claim 1, wherein the firstrule based grammar includes a first display priority portion.
 17. Themethod of claim 1, further comprising: receiving from an automatedspeech recognition engine an indication a first grammar rule has beeninvoked by a voice command, the first grammar rule included in the firstrule based grammar; and transmitting the voice command to a web server.18. The method of claim 1, further comprising: displaying an outputaudio level.
 19. The method of claim 1, further comprising: displaying afirst phrase portion of the first rule based grammar.
 20. The method ofclaim 1, further comprising: parsing the first data into a first speechbased grammar.
 21. The method of claim 20, further comprising: audiblyoutputting the first speech based grammar at a browser displaying thefirst web page.
 22. The method of claim 21, wherein the outputtingoccurs responsive to receiving a voice command.
 23. The method of claim1, wherein the parsing occurs at a browser.
 24. The method of claim 1,wherein the grammar components are given by binary data.
 25. Acomputer-implemented method for displaying a visual web page having anaudio interface, comprising one or more computer processors performingthe following: receiving, at a browser and from a web server, a firstweb page including a page mark-up that includes both first componentsparseable into a first visual web page and second components parseableinto a first rule based grammar generated irrespective of content of thefirst components, wherein the first rule based grammar is specified by aspeech markup language; opening a connection between the browser and amodule; traversing the first web page; during the traversal, identifyingwhich of the components of the first web page are the first componentsparseable into the first visual web page and which of the components ofthe first web page are the second components parseable into the firstrule based grammar; parsing the second components into the first rulebased grammar; loading the first rule based grammar into a memoryaccessible by the module in association with the first visual web page;sending the first components to the browser over the connection;displaying the first visual web page at the browser; processing areceived voice command at the module in accordance with the first rulebased grammar; and displaying, at the browser, a second visual web pagereceived responsive to the command.
 26. The method of claim 25, whereinthe voice command is processed at an automated speech recognitionengine.
 27. The method of claim 25, wherein the first text content andthe second text content include Hypertext Markup Language.
 28. Themethod of claim 25, further comprising: loading a second rule basedgrammar received with the second text content.
 29. The method of claim25, further comprising: parsing first data into a first speech basedgrammar.
 30. The method of claim 29, further comprising: audiblyoutputting the first speech based grammar at the browser.
 31. The methodof claim 30, wherein the outputting occurs in response to receiving avoice command.
 32. The method of claim 25, wherein the processing occursat the browser.
 33. A system for providing a visual web page having anaudio interface, comprising: an input arrangement for receiving, from aserver, a first web page including a page mark-up that includes firstcomponents including verbal components parseable into a visual web pageand second components parseable into a first rule based grammargenerated irrespective of content of the first components, wherein thesecond components that are parseable into the first rule based grammarare specified by a speech markup language; a browser for traversing thefirst web page, wherein, during the traversal of the first web page, thebrowser identifies which of the components of the first web page are thefirst components parseable into the first visual web page and which ofthe components of the first web page are the second components parseableinto the first rule based grammar; a module for parsing the identifiedsecond components into the first rule based grammar, the first rulebased grammar including rules that each includes a first command portionidentifying a command to be performed upon invocation of the respectiverule and a first tag portion which uniquely identifies the respectiverule; and a database for storing at least a portion of the first rulebased grammar in association with the first visual web page.
 34. Thesystem of claim 33, wherein the data specifying the web page includesHypertext Markup Language.
 35. The system of claim 33, wherein themodule processes a voice command in accordance with the first rule basedgrammar.
 36. The system of claim 35, wherein the input receives a secondrule based grammar responsive to the voice command.
 37. The system ofclaim 33, wherein the first text content data further specifies a firstspeech based grammar associated with the web page.
 38. The system ofclaim 37, further comprising: an output configured to audibly output afirst speech output in accordance with the first speech based grammar.39. The system of claim 38, wherein the first speech output is outputtedin response to receiving a voice command.
 40. The system of claim 33,wherein the module is included in the browser.
 41. Acomputer-implemented method to provide a visual web page having an audiointerface, comprising one or more computer processors performing thefollowing: receiving, at a browser, a web page transmitted from aserver, the web page including a page markup that includes both firstmarkup components specifying, describing, and parseable into a visualweb page and second markup components specifying, describing, andparseable into: a rule based grammar generated irrespective of contentof the first markup components, a structure of the rule based grammar,and associated web resources of the rule based grammar; traversing theweb page; during the traversal of the web page, identifying which of thecomponents of the web page are the first markup components and which ofthe components of the web page are the second markup components; parsingthe first components into a display of the visual web page; parsing thesecond components into relations including the rule based grammar andweb resources; associating the web resources with rules of the rulebased grammar as specified by the markup; and loading the rule basedgrammar and associated resources into memory in relation to the visualweb page.
 42. The method of claim 41, further comprising: responding touser interaction with the visual web page in accordance with the loadedrule based grammar to provide an updated display.